Human coding and computational text analysis are more powerful when combined. I show how search and text-reuse tools can aid common hand-coding tasks. Human coding can both inform and be informed by rule-based information extraction—iteratively structuring queries on unstructured text.
Applying this method to public comments on U.S. federal agency rules, a sample of 10,894 hand-coded comments yields 41 million as-good-as-hand-coded comments regarding both the organizations that mobilized them and the extent to which policy changed in the direction they sought. This large sample enables new analyses of lobbying coalitions, social movements, and policy change.
Workflow: googlesheets4 allows analysis and improving data in real-time. For example, in Fig. 1:
Fig. 1: Coded Comments in a Google Sheet
| Entity | Pattern |
|---|---|
| 3M Co | 3M Co|3M Cogent|3M Health Information Systems|Ceradyne|Cogent Systems|Hybrivet Systems |
| Teamsters Union | Brotherhood of Locomotive Engineers & Trainmen|Brotherhood of Maint of Way Employ Div|New England Teamsters & Trucking Pension|Teamsters Airline Express Delivery Div|Teamsters Local 357|Teamsters Union|Western Conf of Teamsters Pension Trust |
Fig 2: Iteratively Building Regex Tables
For example, the legislators package adds variants (e.g., “AOC”) to standard legislator names.
Of 58 million public comments on proposed agency rules, the top 100 organizations mobilized 43,938,811. The top ten organizations mobilized 25,947,612.
| Organization | Rules Lobbied On | Pressure Campaigns | Percent (Campaigns /Rules) | Comments | Average per Campaign |
|---|---|---|---|---|---|
| NRDC | 530 | 62 | 11.7% | 5,939,264 | 95,795 |
| Sierra Club | 591 | 110 | 18.6% | 5,111,922 | 46,472 |
| CREDO | 90 | 41 | 45.6% | 3,019,150 | 73,638 |
| Environmental Defense Fund | 111 | 31 | 27.9% | 2,849,517 | 91,920 |
| Center For Biological Diversity | 572 | 86 | 15.0% | 2,815,509 | 32,738 |
| Earthjustice | 235 | 59 | 25.1% | 2,080,583 | 35,264 |
Fig. 3: Iteratively Group Documents
FIg 4: Identifying Groups of Linked Documents with Text Reuse (a 10-gram Window Function)
Fig. 5: Public Comments on Regulations.gov, 2005-2020
Comments that share a 10-gram with 99 or more others are part of a mass comment campaign.
Preprocessing tip:
Summaries speed hand-coding (e.g., use textrank to select representative sentences).
Fig. 6: Lobbying Success by Campaign Size
Public pressure on climate and environmental justice greatly affected policy documents (Fig. 7), but a few organizations dominate lobbying coalitions (Table 2). When tribal governments or local groups lobby without the support of national advocacy organizations, policymakers typically ignore them.
Fig. 7: Policy Text Change by Coalition Size
linkit, fastlink, ML with hand-coded training set)